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[57] ABSTRACT 

A method of Discrete Cosine Transform compression of a 
digital video image. In the method the Held Variance and 
Frame Variance are calculated. When the Field Variance is 
less than the Frame Variance, Field Discrete Cosine Trans- 
form type compression is performed. Alternatively, when the 
Frame Variance Is less than the Field Variance, then a Frame 
Discrete Cosine Transform compression is performed. 

1 Claim, 4 Drawing Sheets 
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ADAPTIVE FIELD/FRAME ENCODING OF redundancy. Under MPEG-2 Draft Standard motion com- 

DISCRETE COSINE TRANSFORM pensation is achieved by predictive coding, interpolative 

coding, and Variable Length Coded motion vectors. The 

This application is a continuation of application Ser. No, information relative to motion is based on 16x16 blocks and 

08/411,126, filed Mar. 27, 1995, now abandoned. 5 is transmitted with the spatial information. It is compressed 

with Variable Length Codes, such as Huffman codes. 

FIELD OF THE INVENTION ^ mpeq^ Draft Standard provides temporal redun- 

This invention relates to video encoders, especially video dancy reduction through the use of various predictive and 

encoders intended to encode and further compress video mtapolativetooKThisismustratedinFIG. 1. FIG. 1 shows 

signals, e.g., discrete cosine transform encoded video sig- 10 three types of frames or pictures, *T Intrapicturcs, "F* 

nals. The invention relates especially to dynamically parti- Predicted Pictures, and "B" Bidirectional Interpolated Pic- 

tionable digital video processors for digital video signal tines. 

encoding. A dynamically partitionable digital video The "T Intrapictures provide moderate compression, and 

processor, as used herein, means a processor that can rune- are access points for random access, e.g., in the case of video 

tion as an n unit processor, e.g., a four byte wide processor, 15 tapes or CD ROMS. As a matter of convenience, one 4 T 

and as n 1-unit processors, e.g., as four one-byte wide Intrapicture is provided approximately every half second, 

processors. The method, apparatus, and system of the inven- The 4 T Intrapicture only gets information from itself. It does 

tion are useful compressing video signals, as in encoding not receive information from any "P" Predicted Pictures or 

broadcast signals, cablecast signals, and digital network "B" Bidirectional Interpolated Pictures. Scene cuts prefer- 

signals, as well as in high definition television, interactive 20 ably occur at *T Intrapictures. 

television, multimedia, video on demand, video «p» Predicted Pictures are coded with respect to a previ- 

conferencing, and digital video recording. 0 us picture. "P" Predicted Pictures are used as the reference 

BACKGROUND OF THE INVENTION for faturc P icturcs ' ^ md *' B " P ictUieS ' 

25 "B" Bidirectional Coded pictures have the highest degree 

The Moving Picture Experts* Group (MPEG) MPEG-2 0 f compression. They require both a past picture and a future 

Draft Standard is a cornr^ession/decompression standard for picturc for reconstruction. "B" bidirectional pictures are 

digital video applications. The standard describes an encod- ncvcr uscd as a reference. 

ing methodjiat results in substantial bandwidth reduction ^ Q$a|ioD t0 me redundancy between 

by a subjective lossy conmression Mowed by a lossless ^ Xte J g^ njI|ta J v Predicted Pictures from M F 

compression. The encoded, xc^sed digital videodata^ Lmrrictures and of "B" Bidirectional Coded Pictures from 

^^^L^ 011 ^^ Z dGCOdcd m m MPEM a pair of past and future pictures is a key feature of the 

Draft Standard compliant decoder. MPEG-2 Draft Standard technique. 

^Standardis The motion compensation unit under the MPEG-2 Draft 

££2S %!SF ^ £™cZ2*Tt 35 Standard is the M^block uniL THe MPEG-2 Draft Stan- 

rr^r^T F^ZZ?7w^ dard Macroblocks are 16x16 pixel rnacroblocks. Motion 

Video Technol Volume 1, No. 4, December 1991, pp. . «. *„ m „ rA 

374-378, E. Viscito and C. A. Gonzales, "Encoding of ^crmaUon consists of one vector fox forward predicted 

Motion VideoS«mencesforto macroblocks, one vector ^ 

Arithmetic Coding," SPIE, VoL 1360, pp. 1572^1576, m macrob ocks, and two vectors for tadir^onally predicted 

oWauaS^MkA Video Com^sion Standard 40 ^S^^^^^t 

i^^n^A^c^o^ G_3*u of the ACM, 16x16 niacroblockis coded differently with respect to the 

Vol. 34, No. 4,^ril 1991 , pp. 46-58, S. Purcell and D. infoimatroo present in i the reference n^locfc ln 

Galbi, '<C Cube'^EG VideJprocessorr SPIE, v. 1659, ^Zn^^^^l^^^tS^ 

(1992) pp. 24-29, and D. J. LeGall, 'WEG Video Com- 45 ^f^ fa 16x16 <* from a past or 

pressionAlgorithm," Signal Process linage Common, v. 4,n. picture. 

2 (1992), pp. 129-140, among others. The difference between the source pixels and the pre- 

The MPEG-2 Draft Standard specifies a very high com- orcted pixels is mduded in the correst^ndmgbit stream The 

prcssion technique that achieves compression not achievable ***** correction ° f prC<Ucted 

with mtraframe coding alone, while preserving the random 50 **** to toe ^constructed block, 

access advantages of pure mtraframe coding. The combina- As described above and illustrated in FIG. 1, each 16x16 

tion of frequency domain intraframe encoding and P«el block of a M F' Predicted Picture can be coded with 

roterpolauve/predictivem^ respect to the closest previous *T' Intrapicture, or with 

Draft Standard result in a balance between intraframe encod- respect to the closest previous "F Predicted Picture, 

ing alone and mterframe encoding alone. 33 Further, as described above and illustrated in FIG. 1, each 

The MPEG-2 Draft Standard exploits temporal redun- 16x16 pixel block of a "B rt Bidirectional Picture can be 

dancy for motion compensated interpolative and predictive coded by forward prediction from the closest past "T or "P" 

encoding. That is, the assumption is made mat "locally- the Picture, by backward prediction from the closest future , T 1 

current picture can be modelled as a translation of the picture or "P" Picture, or bidirectionally, using both the closest past 

at a previous and/or future time. '"Locally" means that the 60 'T or T picture and the closest future «T" or "P" picture, 

amplitude and direction of the displacement are not the same Full bidirectional prediction is the least noisy prediction, 

everywhere in the picture. Motion mformation is sent with each 16x16 pixel block to 

MPEG-2 Draft Standard specifies predictive and interpo- show what part of the reference picture is to be used as a 

larive interframe encoding and frequency domain intraframe predictor. 

encoding. It has block based motion compensation for the 65 As noted above, motion vectors are coded differentially 

reduction of temporal redundancy, and Discrete Cosine with respect to motion vectors of the previous adjacent 

Transform based compression for the reduction of spatial block. Variable Length Coding is used to code the drfferen- 
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tial motion vector so that only a small number of bits are 
needed to code the motion vector in the common case, where 
the motion vector for a block is nearly equal to the motion 
vector for a preceding block. 

Spatial redundancy is the redundancy within a picture. 
Because of the block based nature of the motion compen- 
sation process, described above, it was desirable for the 
MPEG-2 Draft Standard to use a block based method of 
reducing spatial redundancy. Hie method of choice is the 
Discrete Cosine Transformation, and Discrete Cosine Trans- 
form coding of the picture. Discrete Cosine Transform 
coding is combined with weighted scalar quantization and 
run length coding to achieve still further levels of compres- 
sion. 

The Discrete Cosine Transformation is an orthogonal 
transformation. Orthogonal transformations, because they 
have a frequency domain interpretation, are niter bank 
oriented. The Discrete Cosine Transformation is also local- 
ized. That is, the encoding process samples on an 8x8 spatial 
window which is sufficient to compute 64 transform coef- 
ficients or sub-bands. 

Another advantage of the Discrete Cosine Transformation 
is that fast encoding and decoding algorithms are available. 
Additionally, the sub-band decomposition of the Discrete 
Cosine Transformation is sufficiently well behaved to allow 
effective use of psychovisual criteria. 

After transformation, many of the frequency coefficients 
are zero, especially the coefficients for high spatial frequen- 
cies. These coefficients are organized into a zig-zag, as 
shown in FIG. 2, and converted into run-amplitude (run- 
level) pairs. Each pair indicates the number of zero coeffi- 
cients and the amplitude of the non-zero coefficient This is 
coded in a Variable Length Code. 

Discrete Cosine Transformation encoding is carried out in 
the three stages as shown in FIG. 2. The first stage is the 
computation of the Discrete Cosine Transformation coeffi- 
cients. The second step is the quantization of the coefficients. 

The third step is the conversion of the quantized transform r _ , t . , . 

coefficients into {mn-amplitude} pairs after reorganization 40 ^orm c^gtp.T^ subtractions are carried out in a 
- ^_ . _ J; J~L --j— dynamically partrtionable processor having a plurality of 



10 



15 



20 



25 



30 



35 



Encoding can be accomplished by hardware or by soft- 
ware. Hardware encoding is generally faster than software 
encoding. However, even hardware encoding is slow, given 
the bit rate of a video image and the narrow bandwidth of the 
transmission medium. One reason for this is the many steps 
required in forming the Discrete Cosine Transform, and 
calculating all of its coefficients. 

OBJECTS OF THE INVENTION 

It is one object of the invention to provide a system that 
increases the speed of the encoding process, especially the 
Discrete Cosine Transform encoding process. 

It is still another object of the invention to reduce the 
clock cycles required for encoding a picture. 

SUMMARY OF THE INVENTION 

These and other objects of the invention are attained by 
the digital signal encoder system of the invention. The 
system is useful for receiving the pre-processed, partially 
encoded but uncompressed macroblock and forming the 
discrete cosine transform thereof. The processor of the 
invention works in conjunction with other elements of the 
encoder system including a quantizer, a variable length code 
encoder, and a FIFO data output buffer to provide an 
integrated system 

The processor of the invention is utilized in a digital video 
encoder processor for discrete cosine transform encoding. 
The discrete cosine transform encoding includes the encod- 
ing steps of (1) determining the discrete cosine transform 
field or frame type, (2) addressing individual pixels as cither 
(i) vertically adjacent pixels on consecutive Odd and Even 
field lines, or (ii) vertically adjacent pixels on consecutive 
Odd field lines, then consecutive Even field lines; or (iii) 
vertically adjacent pixels on consecutive Even field lines, 
then consecutive Odd field lines. These subtractions may be 
performed between (i) consecutive lines, (ii) odd lines, or 
(iii) even lines. The next step is finding the smallest variance 
of the above subtractions to determine the discrete cosine 



of the data into zig-zag scanning order. 

Quantization enables very high degrees of compression, 
and a high output bit rate, and retains high picture quality. 

Quantization can be adaptive, with "T Intrapictures hav- 
ing fine quantization to avoid "blocking" This is important 
because 'T' Intrapictures contain energy at all frequencies. 
By way of contrast, "F* and "B" pictures contain predomi- 
nantly high frequency energy and can be coded at a coarser 
quantization. 

The MPEG-2 Draft Standard specifies a layered structure 
of syntax and bit stream. The bit stream is separated into 
logically distinct entities to prevent ambiguities and facili- 
tate decoding. The six layers are shown in Table 1, below 

TABLE 1 



45 



30 



55 



Layer 



MPEG-2 Draft Standard Layers 
Purpose 



Sequence Layer 
Group of Pictures Layer 

Picture Layer 
Slice Layer 
Macroblock Layer 
Block Layer 



Random Access Unit and Context 
Random Access Unit and Video 
Coding 

Primary Coding Unit 
RcsyncfanniMtion Unit 
Ihfiota3D OoflipcossjBoo tJnit 
DCTUuit 



60 



6*5 



dynamically partrtionable processor having a plurality 
datapaths. The datapaths are partitionable by the action of 
running opcode into (i) a single wide datapath, and (ii) a 
plurality of narrow datapaths for calculating the absolute 
value of the difference between two pixels, and accumulat- 
ing the results of the subtraction. 

According to a further exemplification of the invention 
there is provided a method of memory management in a 
digital image encoder to minimize memory bandwidth 
demands during encoding. The method is used with motion 
video data having temporal and spatial redundancy where 
chrominance and luminance data are stored temporarily. 
According to the method disclosed chrominance and lumi- 
nance data are stored in separate locations in memory. The 
luminance data is fetched from memory and is the only 
image data used for motion estimation. The chrominance 
data is fetched from memory in a chrommance-luminance 
pair for image reconstruction. The reconstructed image is 
stored in memory, and fetched from memory for motion 
estimation. 

According to a still further exemplification of the inven- 
tion there is provided a method of encoding digital video 
image data having luminance and chrominance components, 
where the chrominance components are encoded at one 
quarter the spatial resolution of the luminance components. 

And, according to a still further embodiment of our 
invention, there is provided a method of Discrete Cosine 
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Transformation of a digital video image. Id the disclosed every pixel would require 256 operations. To speed up the 

method the Held Variance and Frame Variance are calm- computation, the operations are carried out in parallel, 

lated. When the Field Variance is less than the Frame The processor 11 of the invention has an instruction store 

Variance Field Discrete Cosine Transform type transforma- 21 where microcode is stored. The processor 11 has a four 

tion is performed. Alternatively, when the frame Variance is 5 byte wide arithmetic and logical unit 31 that is comprised of 

less then the Field Variance, than a Frame Discrete Cosine four one byte ALU's, 33, 35, 37, and 39. The processor 11 

Transform transformation is performed. has a two level set of S cncral Purpose working registers, 41, 

_ . ... , . ^ a group of special purpose registers, 43, an instruction 

AccordmgtoastiUrurther emtodm^nt of our mvenUon, Tstack^d a condition renter, 45. 

there is provided a method for encoding bitstream header ^ onn fQm ^ 

where templates for the bitstream header are stored in a 10 ^ ^ ^ ^ pipeiines 3 1. The pipe- 

buffer. The templates being addressable by programmable lined ALU, 31, is made up of four individual one byte ALUs, 

instructions, and the processor has a status register contain- 33 35 37 and 39. These four specialized arithmetic and 

ing a bit for each header type. The status register is modi- logical unitSj alus, 33, 35, 37, and 39, are the core of the 

fiable during the encoding process with a data pattern arithmetic and logic operations. Processor operations occur 

indicating the headers needed for encoding with the bit- 15 m f our pipelined cycles: 

stream. In this way when a bit is set to 1 the predefined j FETCH 

header type is generated and shipped to the bitstream, the 2. DECODE 

header being generated by processing the header buffer ^ EXECUTE, and 

template entries associated with the header type. ^ WRITE BACK. 

According to a still further embodiment of our invention, 20 Microcode instructions are first fetched from the instruc- 

there is provided a method of encoding a low frame rate tion store, 21, and then decoded. The ALU controller, 30, 

digital video source image to a high frame rate digital video provides data/control signals from the register/memory 

target image in an encoder, where repeat fields are Intro- interface unit, 47, and the instruction fetch/decode unit, 23, 

duced into a high frame rate digital video intermediate respectively, through the ALU control unit, 30, to the ALUs, 

image, 25 33, 35, 37, and 39, based on the decoded instruction and the 

results of the previous instructions for data pipelining. 

THE FIGURES The processor, 11, can operate on either register/memory 

„ . , . data from the register/memory interface unit, 47, or pixel 

FIG. 1 shows the relationship of the Intraframe, the ^ ^ to ^ processor 0 n dedicated pixel buses, 49. 

Predicted Frames, and the Bidirectional Frames to form a 30 Branch/loop instructions are performed by a separate 

Group of Pictures. branch/loop processor unit, 25. 

FIG. 2 is a flow chart of three stages of discrete cosine Data is processed by the ALUs, 33, 35, 37, and 39, in the 

transform encoding. EXECUTE cycle and stored to registers/memory, 41, 43, 

FIG. 3 is a block diagram of the dynamically partitionable and 45, during the WRITE BACK cycle through the register/ 

digital video encoder processor of the invention. ^ memory interface unit, 47. The processor, 11, can access a 

FIG. 4 shows the subtraction of pixels, e.g., between lcvel f of « eneral P"* 0 * weeking re^stm 41 and 

consecutive lines, between odd lines and between even a of special f^%^^^^ n ^ n 

' processor, 11. A pixel bus, 49, is also provided for access to 

cs * the registers/memory from external sources. A block dia- 

DETAILED DESCRIPTION OF THE 40 processor, U, is shown in FIG. 3. 

INVENTION Each instruction is 27 bits wide. There are several instruc- 
tion formats defined for this processor. A typical instruction 

Every pixel in a digital video picture is represented by 1 has an opcode, a mode bit, destination field, and 2 source 

byte of luminance and 1 byte of chrominance information. fields. The opcode is used to indicate what function is to be 

This is specified in the 4:2:2 MPEG standard. With a 45 performed by the processor. 

m a ximum picture size of 720 by 480 pixels and a transmis- The mode bit tells the processor how to operate on the 
sion rate of 30 pictures per second, storage of the video instruction. The two modes are "UNT and "LP'. "UNI" 
image requires a large amount of memory. Moreover, a high mode operates as one four byte operation: While "LP" mode 
bandwidth is required to transmit a video image across a (LOGICAL PARTITION) operates as four one byte opera- 
transmission medium. Digital video compression is intro- tfons independent of each other. The source fields specify the 
duced to lower the memory and transmission medium band- location of the inputs to the operations. The destination field 
width requirements. The end result of compression is a specifies the location to store the result of the operations, 
digital video image with less date bytes than the original The arithmetic and logical function unit, 31, consists of 
picture but with as much iirfonnatfon as possible. f our 1 byte standalone arithmetic and logical units (ALUs), 

me ttocessor unit propagates to the next higher order unit if the instruction 

One step in video compression is to determine the quan- specifies a 4 byte operation, 

tization value per segment of the picture. The concept of In each arithmetic and logical unit, there is an accumu- 

quantization is to reduce the value of each pixel in the lation function. The accumulator per ALU is 16 bits wide, 

segment by a stepsize so that as many zeros as possible are 60 An add accumulate instruction is architected that permits the 

created. In general, as the result of subsequent compression accumulation of the addition results with the previous data 

and encoding techniques, zeros require less data bits to in the accumulator. Hie add circuitry allows two 8 bit inputs 

represent. The value of the quantization factor or constant is to be added to a 16 bit accumulator. This function permits 

selected based upon a human vision model The selection of accumulation of up to 256 bits of input data, 

the quantization value requires computation that involves 65 There are eight 8 by 8 multipliers installed in the 

every pixel in the segment. There are 256 bytes of luminance processor, two per ALU. A 32 by 32 multiplication operation 

data per macroblock in the 4:2:2 MPEG standard. To involve is also architected into the processor. 
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A subset of the 32 by 32 multiplication is the 32 by 16 two instructions form a subroutine and are looped until all 

multiplication. Two 8 by 8 multipliers in each unit are joined pixels in the macroblock have been used. The loop is used 

to form an 8 by 16 multiplier. In addition the two 8 by 16 once for the frame DCT calculation and once again for the 

multipliers of adjacent units are joined to form a single 16 field DCT calculation. The SUBABS instruction is used to 

by 16 multiplier. The input operands are parsed to allow 5 calculate the absolute value of the difference between two 

HxL and LxL multiplications. Hie result of the two 16 by 16 pixels. The ADDACC instruction is used to accumulate the 

multiplications are combined to form the 32 by 16 multi- results of the SUBABS instruction, 

plication. This design allows the maximum reuse of dr- By pipelining the result of the SUBABS into the input of 

cuitry. the ADDACC, intermediate memory read or write instruc- 

One compression technique employed in MPEG2 is Dis- Q tions are not needed. In this mode of operation, the above 

crete Cosine Transform (DCT) coding. This process is used instructions improve performance by reducing the cycles 

to convert spatial domain image data into frequency domain required per calculation. 

image data. The picture image is fed to the DCT process in The processor, 11, of the invention is utilized in a digital 

small subset of 8 by 8 pixels. An 8x8 block of pixels is video encoder processor for discrete cosine transform 

defined as a macroblock. encoding. The discrete cosine transform encoding includes 

If consecutive lines of pixels are used to feed the DCT 15 the encoding steps of (1) determining the discrete cosine 

process, this technique is defined as frame DCT type. transform field or frame type, (2) addressing individual 

Alternatively, if every other line of pixel image data is pixels as either (i) vertically adjacent pixels on consecutive 

concatenated to form the 8 by 8 DCT input then this Odd and Even field lines, or (ii) vertically adjacent pixels on 

technique is defined as field DCT type. In general if there is consecutive Odd field lines, then consecutive Even field 

motion between the two fields of a picture, as often occurs 20 lines; or (iii) vertically adjacent pixels on consecutive Even 

in interlaced pictures, then the pixel data has large differ- field lines, then consecutive Odd field Hues. These subtrac- 

ences between consecutive lines. If there is no motion tions may be performed between (i) consecutive lines, (ii) 

between fields then mere is very little variance between odd lines, or (iii) even lines. The next step is finding the 

consecutive lines of pixel data. Typically the lower the pixel smallest variance of the above subtractions to determine the 

variance the higher the compression efficiency that can be 25 discrete cosine transform coding type. The subtractions are 

achieved. carried out in a dynamically partitionable processor having 

The encoder determines the DCT type, e.g. field or frame, a plurality of datapaths, 33, 35, 37, 39. The datapaths 33, 35, 

by calculating the variances of the input pixel image. The 37, 39, are partitionable by the action of running opcode into 

input is fetched according to the address modes set by the set (i) a single wide datapath, 31, and (ii) a plurality of narrow 

addressing mode instruction. This instruction sets one of the 30 datapaths, 33, 35, 37, 39, for calculating the absolute value 

6 unique address modes. 'Mode 1* will address two verti- of the difference between two pixels, and accumulating the 

cally adjacent pixels on consecutive Odd and Even field results of the subtraction. 

lines in the macroblock. 'Mode 2* will first address two Another compression technique following calculation of 
vertically adjacent pixels on consecutive Odd field lines, the DCT coefficients is quantization. Quantization is a 
then switch to consecutive Even field lines. 'Mode 3' will 35 process to determine the step size per macroblock. Stepsize 
first address two vertically adjacent pixels on consecutive is based on the light intensity variances of the macroblock. 
Even field lines, then switch to consecutive Odd field lines. The average of intensity of the macroblock is first calcu- 
Modes 4, 5, and 6 are identical to Modes 1,2, and 3 lated. Variances of each block are then determined. The 
respectively, expect that one pixel in each of the Odd or smallest variance is used to select the stepsize for the 
Even lines of the macroblock is addressed, instead of two. 40 macroblock. In the processor described herein, the average 
The one pixel addressing modes are not used in the DCT intensity can be calculated by ADDACC and shift ins true- 
type calculation. tions. The ADDACC instruction forms a subroutine of one 
The DCT type calculation involves three different sub- instruction and is looped until all of the pixels in the 8 by 8 
tractions: subtraction between consecutive lines, subtraction block are used. The accumulated result is divided by 64 via 
between the odd lines and subtraction between the even lines 45 a shift right instruction. 

as shown in FIG. 4. The LP mode option is used for performance improve- 

The smallest variance of the above subtractions is used to ment The addition of all luminance pixels is performed by 

determine the DCT coding type. This pixel variance calcu- the four ALUs, 33, 35, 37, and 39, in parallel. The average 

lation is computation intensive involving every pixel. There of each group is then calculated by performing a SHIFT 

are 256 pixels every macroblock, requiring 128 subtractions 50 RIGHT on the result. 

and 128 additions for the frame DCT calculation, and The variance intensity is calculated by the SUBABS and 

another 128 subtractions and 128 additions for the field DCT ADDACC instructions. The SUBABS is used to determine 

calculation. In a final step, the totals of the four accumulated the difference of each pixel data from the average of the 

values are added using an accurnlator sum instruction, block. ADDACC is used to accumulate the differences in 

ACCSUM. A compare instruction is needed to determine 55 each block. The smallest accumulation among the four 

which variance is smaller frame DCT or field DCT. In blocks is used to determine the stepsize of the macroblock. 

processors used heretofore this decision will require 512 By choosing LP mode, the computation of the four blocks is 

calculations. In the processor of the instant invention, the carried out simultaneously. 

mode bit is used to specify 4 calculations to be carried out The architecture of the instructions in the processor, U, 

in one cycle. The number of cycles required to perform this so and the execution unit design allows the dynamic partition 

calculation is improved by a factor of four, resulting in one of a single four byte dataflow to operate as one four byte 

hundred twenty eight cycles required to perform the calcu- dataflow unit or as four one byte execution units. The 

lation. With the combination of the subtract absolute dynamic partitionable capability enhances the processor 

(SUBABS) and add accumulator (ADDACC) instructions, output, thereby providing a system that increases the speed 

only 64 cycles are required. 65 of the encoding process, especially the Discrete Cosine 

Only two instructions are needed to calculate the van- Transform encoding process, and reducing the clock cycles 

ances. The instructions are SUBABS and ADDACC. These required for encoding a picture. 
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Method and Apparatus for Color Conversion structure. It is assumed the lines in the frame structure are 

____ _ . , «. „ A .. , _ numbered in ascending oeder such as 1,23 etc. The Frame 

L ^f^r J^JSr^ ° P varianceisthesumma^ 

source is either 422 or 420 fonnat In the YUV color space, 2 ^3^4.^ 

most of the high frequency components are concentrated in ' 

theYcompo7nt HcVever,hu^n visual acuity is lowest in 5 ™ c Fiel * * m JT Reldl 

the high rrquency chrcmiiiance components, and highest in variance and Held2 variance. Reldl variance is the sum- 

the low frequency luminance components. For this resaon, of the absolute differences between adjacentodd 

high corrmression of video sources can be obtained if the * the to structure. With the same numbering 

viteoso^esarecoin^ in ^eme as above, me H^ 

format The color conversion to 420 requires reduction in 10 * e ** olute dU f~* 3 ' *** 111? 

both the horizontal and vertical dimensions, producing Held2 variance is calculated in the surnta fashion with the 

chrominance components that are one quarter the spatial exception the even hne timbers are used. Field2 variance is 

resolution of the luminance components. * e summation of the absolute differences of hne 2 and 4, 

In progressive video, there is no motion between fields. A 15 ^^ M< *^ etc< . . 

lmear fito can be used to reduce the ciirormnace compo- ff me frame vanance 1S less man Fiddl vananc « P? us 

nents. In interlaced full motion video, there is time ™ var ^ 35 *^ e > * eD *^^ CTet ? £ 0S * e 

difference, and thus motion between two fields in a frame Transform is chosen Otherwise the Field Discrete Cosine 

structure picture. Because the chrorninance components of Transform is selected, 

two adjacent fields are very uncorrelated, a linear filter will „ Generation of the MPEG Header 
create artifacts. Therefore motion compensation must be 

considered in the chrominance reduction. Bitstream headers, as defined in the MFEG2 Standard, 

Only the chrominance components of one field are used in contain information concerning the attached bitstream. The 

the invention described herein. One field of chrominance syntax of the header must be precise, as it is used to inform 

information is applied to both luminance fields. No chromi- 25 decoder how to reconstruct the encoded picture. The 

nance motion compensation is needed in this method. By usage of headers are application dependent, A flexible 

eliminating one field, the vertical coinponents are also design is needed to allow easy adaptation to specific appli- 

reduced. Since all horizontal components of a field are cation requirements. Performance is also important so that 

within the time interval, there is no motion and a linear filter valuable time is not taken away from the picture encoding, 

is used to reduce the horizontal coinponents. 30 In the encoder of our invention, a pre-loaded template is 

Color reduction is achieved by this economical filter and used in combination with a set of programmable instructions 

enhances video compression. to generate the header bitstream. A 190x27 bits on-chip 

header buffer contains the templates for header types speci- 

Field and Frame Encoding fled in the MPEG2 standard, MP@ML. This buffer is 

One form of image compression is achieved by removal 35 initialized from an external processor at the power on reset 

of redundant data called spatial redundancies. This form of time - 

redundancy occurs due to correlated infennation between Each header type occupies several entries in the buffer, 

adjacent pixel data. The header generator of the invention contains a status 

In interlaced full motion Yideo, there is a time difference register, writable by the processorof the invention, that 

between fields and thus there is motion between adjacent 40 works with the header buffer. The status register contains a 

fields. This motion creates a discontinuity between adjacent bit for each header type. During the encoding process, the 

lines within a frame status register is modified with a data pattern indicating the 

The MPEG2 standard allows the use of Field and Frame heade / When a he * er command is 

Discrete Cosine Transform types to achieve better compres- AK ^ SU( *> * e generator processes the status register 

sion. Fiame processing works better when there is little or no 45 *°m left to right, one bit at a time. When a bit is set to 1, 

motion. There are more lines in the Frame format, thus there the predefined header type is generated and shipped to the 

is more correlation for increased coinpressibility. However ^stream. The header is generated coprocessing the header 

Field format processing works better in vertical detail buffer template entries associated with the header type, 

motion. In such cases the field data is more correlated than Each entry in the header buffer contains the following 

the frame data. 50 Acids' 

The Discrete Cosine Transform Type decision, i.e., the 1. valid, 

Field or Frame decision, is calculated for every macroblock. 2. length, 

The encoding method of the invention is based on the 3. command, and 

observation that if the two fields of the picture are from 55 4 

different times, then the data between the adjacent lines During inMalizatton all valid bits are set "off". Hie valid 

within a field will generally be closer in value than the data oit & only set by microcode when the associated data is 

of adjacent lines within the frame. needed in the bitstream per application. During "ship 

The encoder calculates the total differences between adja- header** processing the contents of the "data** field are put 

cent lines of the Frame structure of the macroblock and then go into the bitstream if the "valid** field is "on w . 

of adjacent lines of a Field structure within the macroblock. The 'length" field is loaded during the initialization 

If the Held total variance is less, then Held Discrete Cosine process. It is used to indicate the length of data in the "data" 

Transform type is chosen. If however, the Frame total field to be shipped to the bitstream when the 'Valid" bit is 

variance is less, then a ftame Discrete Cosine Transform "on". 

type is chosen. 65 There are three bits in the "command** field. The "com- 

The Frame variance is calculated by the surnrnatioa of the mand" field is used to inform the header generator of the 

absolute differences between adjacent lines in the frame processor of the location of data, how to generate the data, 
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and how much data to be inserted into the bitstream. The nance data. However luminance and drrominance rnove- 

"command" codes are defined as follows: ments track each other. To minimize computation 

001 ship content of data field into bitstream requirements, only luminance data is used in motion esti- 

. ... . .... mation. When pixel data is retrieved from external memory, 

010 concatenate 20 bits of zeros with the content of data "r y . _ , ^ ^ A AAtmem c*i«^ 
UA " 7, tT. • * , , . _ 5 the memory access tune depends on Column Address Select 

field & ship into bitstream (C AS) and Row Address Select (RAS) times. Longer delays 

0 11 ship next 64 bytes of data in data buffer into bitstream m needed for new raS access than an adjacent CAS 

100 ship macroblock data into bitstream within a RAS. For this reason, the luminance data is stored 

101 ship content of two consecutive data entries into the separately from the chrominance data. This maximizes adja- 
bitstream io cent CAS during motion estimation. The chnmiinance data 

110 ship content of data field into bitstream and reset valid is sent to the memory control in UY pairs and is stored and 
bit fetched from the DRAM in this manor to save memory 

111 ship user data into bitstream ^width when processing a picture store or a macroblock 
The 000 code is undefined (Mi3) tetch ' _ , . 

Contents of data field is uvitialized by the external pro- 15 As defined in the MPEG2 standard, a picture is divided 

cessor. It can later be modified by microcode. into smaller subimages or macroblocks. The macroblocks 

The content of the header buffer is writable by either the are coded independently of one another. A macroblock is 

external or the internal processor. This provides flexibility. defined as 16 pixels horizontally and 16 lines vertically. It is 

The internal processor, that is, the processor of the invention, further defined that the pel unit in the macroblock could 

only has to set up a few registers when the ship header 20 either be full or half peL Let x and y be adjacent pixels in 

command is issued The processor is freed to process other a picture, half pel is defined as 
work while the header hardware builds and ships the header 
into the bitstream, thus improving performance. 

Memory Organization 25 To fonn 16 m na]f me encoder has to retrieve 

Temporal redundancies are redundant image information 17 bytes from external memory. With an input memory bus 

over time, Le., data that is similar or repeats itself from design four bytes wide, five memory accesses are needed to 

frame to frame over time. Motion estimation is a process retrieve 17 bytes of data. However, only 1 byte in the last 

used to reduce temporal redundancies of video images. M access is useful. This applies to luminance and chrominance 

Motion estimation is a process of determining the movement 4^ ( u an< j v ). By storing the u and v in pairs, only nine 

of objects within an image sequence. Removal of temporal memory accesses are needed for 1 line of chrominance data, 

redundancies is also a key part of the MPBG2 standard. By wa y of comparison, ten memory accesses are needed if 

However the standard does not specify the method to Luminance and chrominance (u and v) are stored in separate 

accomplish Motion Estimation. The MFEG2 standard only ^ memory locations. 

specifies the headers, bit stream architecture, and protocols ^ mc cncodcrj the input images are accumulated until 

necessary to allow MPEG2 compliant decoding. enough data is saved to start encoding. Data is then fetched 

Motion estimation creates three challenges for the from the external memory during the encoding process. The 
encoder: memory bandwidth, computation requirement, and image is reconstructed from the encoded data. This recon- 
noise. According to our invention, regional block matching ^ structed image is also saved in external memory. The 
is used for motion estimation. Regional block matching reconstructed image in external memory will then be 
involves segmentation of a frame into smaller regions and retrieved later as reference data for motion estimation in 
searching for the displacement which produces a best match subsequent pictures. A piece of pixel data is stored and 
among possible regions in a reference frame. The size of a fetched several times during the encoding process. To mini- 
frame is defined as 720 rows of pixels horizontally by 4S0 4S conflicts, the external memory in the encoder of the 
lines of pixels vertically as defined in the MPEG2 standard. invention is physically separated into different segments. 
Since the amount of data per picture is too large to be Each segment has its own controls and data path such that 
included inside the encoder chip, it is stored in an external several memory segments can operate simultaneously. The 
memory. Three main steps are required for motion estima- segments are selected based on their data flow in the 
tion function: data retrieval from memory, data computation, ^ encoder. The input image is saved In one memory segment 
and prediction selection. and the reconstructed image is saved in another memory 

The amount of data retrieved from external memory is segment 

directly proportional to the search window. A large search When performing the task of fetching and storing recon- 

window provides a high probability of finding a closer structed reference data, the DRAM saves bandwidth by 

match within a large amount of data, and therefore creates a 5S prioritizing the memory accesses. The encoding parameters 

bigger demand on memory bandwidth. The opposite is true fQj the picture, such as picture type, IP or TPB mode, dual 

for a small search window, which creates less of a demand prime, and the number of reference fields, help to predict the 

on memory bandwidth, but has a lower probability of finding number of fetches each unit in the refinement search path 

a closer match. will have per macroblock. With the number of fetches per 

To maximize memory bandwidth, the encoder memory 60 unit predicted, the memory control executes a DRAM fetch 

control system and method of the invention include lumi- and store pattern far that picture. Patterns for aU of the 

nance and chrominance pair (uv pair) storing, separate different encoding scenarios are predetermined and mapped 

luminance and chrominace locations, memory access pri- into a state machine to provide maximum data flow through 

oritization and physically distinct memories for luminance the chip. Maximum data flow is achieved by mapping out 

and chrominance. 65 DRAM fetches and stores so that each unit may receive data 

Luminance and chrominance data are stored in separate when it will need it, and the reference data is stored back to 

locations. Motions occur in both luminance and chromi- DRAM when it has been determined that the refinement 
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units will be busy with previously fetched data. Memory 
accesses are streamlined as much as possible to prevent 
pauses in processing a macroblock due to units having to 
wait for the memory control to finish another fetch or a 
macroblock store before receiving their data. 5 

The combination of the above features maximizes the 
bandwidth and mlminizes the memory requirement for our 
encoder design. 

While the invention has been described with respect to 
certain preferred embodiments and exemplifications, it is not 10 
intended to limit the scope of the invention thereby, but 
solely by the claims appended hereto. 

We claim: 

1. In a method of discrete cosine transform compression 
of a digital video image, the improvement comprising com- 15 
pressing the digital video image in a processor having four 
parallel ALUs for carrying out four calculations in one cycle, 
and further comprising the steps of: 

(a) calculating field variance and frame variance of a ^ 
macroblock, where 

(1) the field variance is the summation of fieldl vari- 
ance and field2 variance, where 

(i) the fieldl variance is the summation of the 
absolute differences between all pairs of 2n+l and ^ 
2n+3 lines in a frame structure calculated by 
subtract absolute and add accumulator 
instructions, wherein n is an even positive integer 
and 

(ii) the field2 variance is the summation of the 
absolute differences between all pairs of 2m+2 and 



14 

2m+4 lines in the frame structure calculated by 
subtract absolute and add accumulator 
instructions, wherein m is an even positive inte- 
ger; and 
the processor either 

(i) first addresses two vertically adjacent pixels 
on consecutive odd field lines and then 
switches to consecutive even field lines, or 

(ii) first addresses two vertically adjacent pixels 
on consecutive even field lines and then 
switches to consecutive odd field lines; 

(2) the frame variance is calculated by summation of 
the absolute differences between adjacent lines in the 
frame structure calculated by subtract absolute and 
add accumulator instructions, individual pixels 
thereof being addressed by the processor as two 
vertically adjacent pixels on consecutive odd and 
even field lines in the macroblock; 

(b) performing field discrete cosine transform compres- 
sion of the macroblock when the field variance of the 
macroblock is less than the frame variance of the 
macroblock; and 

(c) performing frame discrete cosine transform compres- 
sion of the macroblock when the frame variance of the 
macroblock is less than the field variance of the mac- 
roblock. 

* * * * * 
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